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OPTIMAL ADAPTIVE ESTIMATION OF A 
QUADRATIC FUNCTIONAL^ 

By T. Tony Cai and Mark G. Low 

University of Pennsylvania 

Adaptive estimation of a quadratic functional over both Besov 
and Lp balls is considered. A collection of nonquadratic estimators 
are developed which have useful bias and variance properties over 
individual Besov and Lp balls. An adaptive procedure is then con- 
structed based on penalized maximization over this collection of non- 
quadratic estimators. This procedure is shown to be optimally rate 
adaptive over the entire range of Besov and Lp balls in the sense that 
it attains certain constrained risk bounds. 

1. Introduction. The problem of estimating the quadratic functional 
/ has received much attention in the statistical literature especially since, 
in a density estimation setting, Bickel and Ritov [5] showed under Holder 
smoothness conditions that there is a breakdown in the minimax rate of 
convergence. Fully efficient estimation is possible when the function satisfies 
a Holder smoothness condition with a > However when a < j minimax 
rates of convergence under mean squared error are of the order 72-8"/(i+4o) ^ 

This theory has been developed and extended in a number of important 
directions which can be particularly easily described for the Gaussian se- 
quence model 

(1) Yi = 9, + n-^/^z„ i = l,2,..., 

where Zj are i.i.d. standard normal random variables and where 6 = {9i,62, ■ ■ ■) 
is assumed to belong either to an Lp or a Besov ball. Such models occupy 
a central role in the nonparametric function estimation literature. See, for 
example, [22]. In this sequence model setting estimation of the quadratic 
functional Q{9) = is the analog of estimating the functional / in 

the density estimation model. 
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The Lp balls are defined as 

{/ oo \ i/p 

where p> 0, a > 0, Af > and s = a + ^ — i>0. Besov balls in sequence 
space are typically defined in terms of a doubly indexed sequence {Oj^k -j = 
0, 1, . . . , = 0, . . . , 2J - 1}. For p, a, M > the Besov bah B^g{M) is then 
given by 

where once again s = a + i — - >0. In particular, Besov balls contain as 
special cases a number of well-known smoothness spaces such as Holder and 
Sobolev balls. It is possible to give a unified treatment of Besov balls and 
Lp balls by setting in the case of Besov balls 9i = 9j^k, where i = 2^ + k. 
Noisy observation of Besov coefficients can then still be written as in (1). 
This convention is used throughout the paper, where in addition we shall 
assume that p, q,a,s > 0. 

For estimation of the quadratic functional Q{9) over Besov and Lp balls 
there are really two distinct cases of interest. The "dense" case corresponds 
to p > 2 and the "sparse" case to p < 2. Previous literature has focused pri- 
marily on the dense case where the parameter space is quadratically convex. 
In such cases the minimax theory for estimating the quadratic functional 
Q{6) was well developed in [13] and [15]. In particular, this theory covers 
Besov balls Bp^{M) and Lp balls Lp{a,M) when p>2. An important fea- 
ture of this minimax theory is that optimal quadratic rules can be found 
within a "small" constant factor of the minimax risk. 

The minimax theory for parameter spaces which are not quadratically 
convex is quite different. The near minimaxity of optimal quadratic rules 
typically does not hold when the parameter space is not quadratically con- 
vex. Cai and Low [10] develop the minimax theory in such cases over all 
Besov balls and Lp balls with p <2. A nonquadratic minimax procedure is 
given based on term-by-term thresholding. The nonquadratic procedure is 
sometimes fully efficient even when optimal quadratic rules have slow rates 
of convergence. 

The minimax results for estimating the quadratic functional Q{0) over 
= Lp{a,M) or = Bpg{M) can be summarized as follows. Set p^ = 
min{p, 2} , = a + ^ — ^ and let 
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(4) 



r{a,p) 



2- 



1 



1 + 2p^s^. 



p* 



if ap* > -, 
it ap* < -. 



Then 



(5) 



inisupEeiQ-Qie)) 



2 



r{a,p) 



n 



Q 060 



Moreover, if ap^ > 2, 



(6) 



inf sup^e(Q - Q{9)f = AA{e)n'\l + o(l)) 



Q eee 



where A{Q) = supgg@ X^i^i is a constant and AA{@)n~^ is the inverse of 
the nonpar ametric Fisher information. 

In comparison to minimax theory, the theory of adaptive estimation of 
Q{6) is not as weh developed. Most of the progress has been made in quadrat- 
ically convex cases. Efromovich and Low [14] considered adaptive estimation 
of Q{9) over hyper-rectangles, which corresponds to Lp balls with p = 00. It 
was shown that rate optimal adaptive estimators do not exist and that log- 
arithmic penalties must be paid. An adaptive procedure only paying these 
logarithmic penalties was constructed using the method due to Lepski [26]. 
Tribouley [29] and Johnstone [21] developed an alternative adaptive proce- 
dure based on block thresholding algorithms. Gayraud and Tribouley [18] 
also used a block thresholding scheme for adaptation over Besov spaces with 
p = 2 and q = 00. Using Lepski's method to choose within a collection of 
quadratic rules Klemela [23] considered sharp adaptation for Lp balls with 
p>2. 

All of the results mentioned so far focus on quadratically convex cases 
where p >2. The sparse case where p < 2 presents some major new dif- 
ficulties which requires a novel approach for the construction of adaptive 
procedures. The goal of the present work is to develop a procedure which 
adapts simultaneously over all Besov and Lp balls. This problem is signif- 
icantly different from adaptation only over the dense cases where one can 
select from a collection of quadratic estimators. In the sparse case even min- 
imax theory requires nonquadratic rules. 

It is well known from previous work that block thresholding is an effective 
tool for adaptive estimation of Q{6) in the dense case. Block thresholding 
can be used to guard against the worst case when there are a large number 
of small coefficients and where the exact location of these coefficients is 
unknown. On the other hand, in the sparse case, as shown in [10] , the worst 
case occurs when there are a relatively small number of large coefficients 
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with unknown location. In such cases term-by-term thresholding is effective. 
Unfortunately term-by-term thresholding does not work well for the dense 
case, and likewise, block thresholding does not work well in the sparse case. 
In order to develop a procedure that can adapt simultaneously over both 
the sparse and the dense cases we incorporate both approaches. 

There are three parts to the adaptive estimator given in this paper. The 
initial component is based on a simple unbiased quadratic estimate of the 
first part of the quadratic functional. The third component is based on a 
term-by-term thresholding procedure with slowly growing threshold. The 
most important component, at least for the sparse cases, is an estimate of 
the middle part of the quadratic functional Q{9). This estimate is based 
on penalized maximization over a collection of estimators, each of which 
uses both block thresholding and term-by-term thresholding. We show that 
the resulting procedure simultaneously attains the benchmarks for adaptive 
estimation given in Section 2. In particular, it is fully efficient over the 
largest collection of Besov and Lp balls for which efficient estimators exist 
while paying minimal penalty over all other Besov and Lp balls. 

More precisely, it follows from the theorems in the paper that the adaptive 
estimator satisfies for 6 = Lp{a, M) or 6 = Bp g{M) 

(7) sup EeiQ - Q{e)f < Cn-"("'P)(logn)2P*^-/(i+2p*^-) 
when ap^, < ^ and 

(8) svLpEeiQ - Q{e)f = AA{Q)n-\l + o(l)) 

See 

when ap^, > ^, where A{Q) = supgge X^i^i ^1- other words, the estimator 
is adaptively fully efficient over all Besov bodies where efficient estimation 
is possible and only pays a logarithmic penalty when the minimax rate is 
slower than . In fact, it is also shown in the present paper that the upper 
bound given in (7) is rate sharp. 

It is interesting to compare these results with those of an estimator based 
on model selection given in [25]. Their procedure, say QlM; maximizes pe- 
nalized quadratic estimators. It was shown that for p <2, 

sup Ee{QLM-Q{9)f 

e&Lp(a,M) 

logny^/(i+^'') /logny"/(^+2")l C 
rfl J \ n J j n 

for some constant C > 0. A comparison shows that these upper bounds are 
always larger than the upper bounds given in the present paper whenever 



< C min 
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the estimator Qlm of [25] has a bound larger than 0(?i^^). A more detailed 
comparison is given in Section 2. 

The paper is organized as follows. In Section 2 we develop benchmarks 
for the evaluation of adaptive estimators. The major focus of the paper 
is the construction of an adaptive estimator which is described in detail in 
Section 3. A collection of nonquadratic estimators with specific bias variance 
properties is constructed. The adaptive procedure is then built by selecting 
a penalized estimator over this collection through maximization. We show 
that the procedure simultaneously attains the benchmarks given in Section 4 
over all Besov and Lp balls. In particular, it is fully efficient over the largest 
collection of Besov and Lp balls for which efficient estimators exist while 
paying minimal penalty over all other Besov and Lp balls. Proofs are given 
in Section 4. 

2. The cost of adaptation in the sparse case. The primary goal of the 
present work is to construct estimators of Q{6) = X^i^i which are adaptive 
over all Besov and Lp balls. This goal, however, needs to be made precise 
because even in the dense case of p > 2 it is well known that fully minimax 
rate optimal adaptation of the quadratic functional Q{9) = J2i^i^i is not 
possible. See, for example, [14]. A penalty must be made over Lp or Besov 
balls with p>2 and a < Hence in the present context a rate adaptive 
estimator is one which attains well defined lower bounds. 

In this section we shall develop the appropriate lower bounds needed as 
a benchmark for the evaluation of adaptive procedures which are given in 
Section 3.2. We shall see that an entirely similar phenomenon occurs in the 
sparse case, although the exponent of the logarithmic penalty is different. 
In particular, the following theorem shows that fully rate adaptive estima- 
tion of the quadratic functional Q{0) = is not possible over any 
pair of Besov spaces which have different minimax rates of convergence. In 
the following theorem denote by the zero vector. Then £"0 denotes the 
expectation under the sequence model (1) when 9 = 0. 

Theorem 1. Let Q be an estimator of the quadratic functional Q{6) = 
X^i^i^f- Let r{a,p) be the minimax rate for estimating Q{9) over = 
B^g{M) or e = Lp{a,M). Suppose that 

(10) Eo{Q-Q{0)f <Cn-^ 

for some constants 7 > r{a,p) and C > 0. Then the maximum squared bias 
over Q satisfies, for some constant C > 0, 

(11) snp{EeQ - Q{9)f > C"n-^("'P)(logn)2p*^*/(i+2p*^*). 
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The theorem makes clear that rate optimal estimators over one Besov or 
Lp ball must pay a logarithmic penalty for the maximum risk over all Besov 
and Lp balls which have slower minimax rates of convergence. In fact, as 
shown in (11), this logarithmic penalty must be paid in terms of maximum 
squared bias. 

The major use of the lower bound given in the above theorem is as a 
benchmark for the development of an adaptive estimator. Adaptive estima- 
tors which attain these bounds must over each parameter space inflate the 
maximum bias over that parameter space. In the next section we shall use 
this fact to guide us in the development of estimators which are adaptive in 
the sense that they attain the lower bound given in Theorem 1. 

The proof of this theorem also immediately yields the following corollary 
which shows the "inflexibility" of minimax rate optimal estimators, at least 
in cases where the minimax rate is slower than n^^. In particular, there does 
not exist an estimator which attains the exact minimax rate of convergence 
over any pair of Besov or Lp balls which have different minimax rates of 
convergence. 

Corollary 1. Let Q be a minimax rate optimal estimator of the quadratic 
functional Q{e) = J2°Zi9f over 6 = B^g{M) or 6 = Lp{a,M) where the 
minimax rate r{a,p) < 1. That is, 

(12) sup^e(Q-Q(^))'<l?n-^("'P) 
eee 

for some D > 0. Then 

(13) ^o(Q-Q(0))'>Z)'?i-''("'P) 
for some D' > and hence 

(14) suvEe{Q-Q{9)f>D'n-'-^'''P\ 

where &' is any Besov or Lp ball. 

The benchmark given in Theorem 1 is useful for the evaluation of adap- 
tive procedures over parameter spaces which have a minimax rate of con- 
vergence slower than n~^. On the other hand, over Besov and Lp balls with 
ap* > ^, the minimax risk given in (6) is another useful benchmark. Estima- 
tors attaining (6) can be termed efficient since they attain a nonparametric 
information bound as given, for example, in [4]. See also [10]. 
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3. The construction of an adaptive procedure. The major goal of the 
present paper is the construction of an estimator of Q{6) which adapts over 
aU Besov and Lp balls. The development of such an adaptive estimator can 
perhaps best be understood by breaking this construction into two stages. In 
the first stage a collection of nonquadratic estimators is constructed using 
both block thresholding and term-by-term thresholding. These estimators 
have precise bias and variance properties. More specifically, for a given Besov 
or Lp ball when the minimax rate is slower than n~^, one of the estimators 
in the collection has maximum squared bias attaining the lower bound given 
in (11) and which has variance smaller than the minimax risk. On the other 
hand, when fully efficient estimation is possible, one of the estimators has 
negligible bias and the variance attains the minimax lower bound. The con- 
struction of these nonquadratic estimators is given in Section 3.1. 

These nonquadratic estimators are then used to build an adaptive proce- 
dure. At this stage the adaptive estimator is created by maximizing penalized 
versions of these nonquadratic estimators where the penalty is chosen to be 
a logarithmic factor of the standard deviation of each of these estimators. 

The general approach of model selection via penalization has been shown 
to be effective for a number of adaptive function estimation problems. See, 
for example, [1, 3, 6, 25]. In particular, a major advance in estimating the 
quadratic functional Q{0) was made in [25], where it was shown that maxi- 
mizing penalized quadratic estimators of Q{9) can yield a procedure which 
is adaptive over certain Besov and Lp balls. It is shown in Section 3.2 that 
the procedure based on maximizing the penalized nonquadratic estimators 
is adaptive over all Besov and Lp bodies. A comparison with the estimator 
of Laurent and Massart [25] is given in Section 3.3. 

3.1. Nonquadratic estimators with specific bias and variance properties. 
We start with the construction of a collection of estimators which have pre- 
cise bias and variance properties. These estimators incorporate both block 
and term-by-term thresholding. It is known that block thresholding estima- 
tors can perform well for dense cases, that is, when p>2 and that term-by- 
term thresholding estimators can be minimax rate optimal for sparse cases, 
that is, when p < 2. By combining block thresholding and term- by-term 
thresholding, estimators can be constructed which trade bias and variance 
in very useful ways for both the dense and sparse cases. More specifically, 
for a given Besov or Lp ball we build an estimator that has inflated maxi- 
mum squared bias and reduced variance and which in particular attains the 
adaptive rate of convergence for mean squared error. 

It is useful to break the problem of estimating Q{9) into three components 
as follows. Let niQ = (^1^^^)'^ and ruk = 2^mo for > 1. Divide the indices i 
beyond niQ into blocks of increasing sizes so that the kth block is of size m^. 
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Let J be the largest integer satisfying 2^ <n and set 

■mo mj oo 

(15) eo = E^'' ^mid= E and etaii= E 

i=l i=r7io+l i=mj+l 

Then clearly Q{9) = .^o + ■^mid + Ctaii- We shall use different strategies for 
estimating the three components ^o, Cmid and ^taii- Estimation of ^mid is the 
most involved and so we shall first describe estimators for and ^taii- 

The component .^o is naturally estimated by the unbiased quadratic esti- 
mator 

(16) ^"° = EF-^ 
Note that for 6 = B^g{M) or G = Lp{a, M) 

(17) SUp^6l(Co -Co) =SUp<^ \ 2-r = + 

where A{@) = supg^Q Si^i Gf- It is clear that this term is equal to the min- 
imax risk when fully efficient estimation is possible and negligible whenever 
the minimax rate of convergence is slower than the parametric rate of n~^. 

The technique underlying the estimation of the tail component is similar 
to that used for minimax estimation of Q{0) in the sparse case as given 
in [10]. First define ji by 



(18) l^ 



log2 — 

mj 



+ 1], i>mj + l, 



where [x] denotes the smallest integer greater than or equal to x. That is, 
= 2{j — J + 2) for rrij -|- 1 < i < m^+i and j > J. Then the tail compo- 
nent Ctaii is estimated by a term-by-term thresholding estimator with slowly 
growing threshold 

(19) t (y^- 



i=mj+l 



n 



We shall show that the risk due to estimation of the tail is always negligible 
relative to the minimax risk for G = Bp g{M) or = Lp(a, M), that is, 

(20) snpEeiL^ - Ctaii)' = o{n-'<"'P^). 

See 

We now turn to estimation of the middle component .^midj which is more 
involved and uses both block thresholding and term-by-term threshold- 
ing. Let = Bp^^{M) or G = Lp{a,M). The estimator ^mid depends on 
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the parameters a and p. For each integer k such that 1 < k < J — 1 set 
Tk,i = 2(j + 1 — k) for nij + 1 < i < mj^i and k < j < J — 1. That is, 



(21) Tk,i 



log2 — 



i > TTlfc + 1, 



where [x] once again denotes the smahest integer greater than or equal to 
X. For i > rrik + 1, set pL^^i = Eq{{Y^ — ^)+} where the expectation is taken 
under 6* = 0. Let 



{ruk - mo) + 2 V(mfc - mo) log(mfc - mo) 

Afc = 

n 

For each 1 < k < J — 1 set 



(22) E + E 

\j=mo+l / + i=mj;+l 



-] - l^k, 

n 



Recall that = min{p, 2} and = a + ^ — Set k^ to be the largest 
integer such that 

(23) mfc, = 2'=* mo < max{2mo,nP*/(^+2p.s.)(jQg^)-i/(i+2p...)|_ 
The middle component ^mid is then estimated by 

Cmid — 

(24) 

= E + E {(^^^-^) 

\i=mo+l /+ i=mfe,+l ^ / + 



We shall show that the risk of ^mid for estimating ^mid is negligible when 
fully efficient estimation is possible and otherwise attains the lower bound 
given in Theorem 1 over the given 0. It should also be noted that the first 
term used to define .^mid would suffice for the dense case where p>2. The 
second term is needed for the sparse case where p <2. 

The quadratic functional Q{0) is then estimated by 

(25) Qk, = Co + Cmid + Ctail- 

The following result shows that this estimator has desirable bias and variance 
properties. 

Proposition 1. Let 6 = Bp g{M) or Q = Lp{a, M) and let the estima- 
tor Qk, he given as in (25). If ap^ < \, then the maximum squared bias 
satisfies 

(26) SUp(^Qfc. - Q{e))^ < C7^-{2-p./(l+2p.s.))(log^)2p*«./(l+2p.s.) 
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and the maximum variance satisfies 

(27) supVar(QfeJ < Cn-(2-P*/(i+2p.«.))(logn)-i/(i+2p*^*). 
eee 

On the other hand, if ap^, > ^, then Qk^ is asymptotically efficient, that is, 

(28) suvEg{Qk,-Q{e)f = AA{Q)n~\l + o{l)), 

6*66 

where A{<c)) = supgge X^i^i ^i^- Furthermore, in the boundary case of ap^, = 
^, Qkt satisfies 

(29) sMEQk - Q{0)f < Cn^^(logn)2P*^*/(i+2p*«*) 
and 

(30) supVar(QfcJ=4A(G)n-i(l + o(l)), 
6»ee 

where once again A{9) = supggg X^i^i ^i^- 

Note that the estimator Q^^ has reduced variance and inflated bias com- 
pared to the minimax risk when the minimax rate of convergence is slower 
than the parametric rate. In fact in these cases the ratio of the maximum 
squared bias to the maximum variance is exactly of order logn. These prop- 
erties are crucial in the construction of the adaptive estimator given in Sec- 
tion 3.2. 

3.2. Adaptive procedure. We shall now turn to the construction of a gen- 
eral adaptive procedure building upon the collection of nonquadratic esti- 
mators given in Section 3.1. The adaptive estimator is the maximization of 
penalized versions of these nonquadratic estimators. Let Qk = Co + + Ctaii 
where tk, and .^taii are defined in (16), (22) and (19), respectively. The 
adaptive estimator is then given by 

(31) Q= max <^ Qfc >. 

We shall show later that for any given Besov or Lp ball the penalty term 
in (31) is always a logarithmic factor larger than the maximum variance of 
the estimator Qk - Moreover, the bias of the estimators Qk is always negligible 
when it is positive, whereas in worst cases it must be negative. Taking a 
maximization with the penalty term results in an optimal trading of bias 
and variance over all Besov and Lp balls. 

It is also convenient to define 

(32) 4mid = max \ ik ^ 

i<fc< J 1^ n ] 
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Then the estimator Q can be equivalently written as 

(33) Q = iQ + Cmid + Ctaih 

where '^mid and ^taii are defined in (16), (32) and (19), respectively. 

The following theorem shows that the estimator Q is optimally adaptive 
over all Besov and Lp balls both for the dense and sparse cases. 

Theorem 2. Let Q{9) = J2i^i(^i o-nd let the estimator Q be defined as 
in (31). Then the risk of Q satisfies for all Besov balls Q = Bp^^{M) and all 
Lp balls @ = Lp{a,M) 

supEe{Q-Q{e))^ 

(34) 

UA{@)n~\l + o{l)), forap,>\, 

\ Cn-(2-P./(l+2p*«*)) (log „)2p*s,/(l+2p*.*) ^ < 1 



< 



2' 



where C > and A{Q) = supgg@ Yll^i ^'^s constants. 



Comparing the upper bounds given in the above theorem with the lower 
bound given in Theorem 1 as well as the information bound given in (6), 
it is clear that the estimator Q is adaptive over all Besov and Lp balls. 
In particular, it is adaptively efficient over those parameter spaces where 
efficient estimation is possible. 

3.3. Discussion. It is interesting to compare the performance of the 
adaptive estimator Q with the estimator, say Qlm, given in [25]. The esti- 
mator there is constructed based on model selection. It chooses a penalized 
quadratic estimator through maximization. In contrast, in this paper the 
adaptive estimator Q selects among a collection of penalized nonquadratic 
estimators. These nonquadratic estimators enable optimal adaptation over 
sparse cases corresponding to Besov and Lp balls with p < 2 in addition to 
the standard dense cases of p > 2. In Table 1, R{QhM) denotes the order of 
the risk upper bound of Qlm given in [25] and R{Q) denotes the order of 
the maximum risk of Q as given in Theorem 2. The comparison is focused 
on the sparse case where p <2. 

Simple algebra shows that the risk upper bounds for Qlm are always 
larger by an algebraic factor than those for Q whenever R{QhM) ^ n~^. In 
particular, if 1 < p < | and ^ < a < i, /2(Qlm) is of order (l2£2i)4a/{i+2a) 

whereas Q is fully efficient. Likewise when | < p < 2 and ^ < < | — ;j, 
R{Qlm) is of order (i2g!i)4s/(i+4.) Q 

is once again fully efficient. 
It is also interesting to note that the problem of estimating the quadratic 
functional Q{9) is strongly connected to the problem of estimating linear 
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functionals. This connection was developed first in [13] where it was shown 
that a modulus of continuity for orthosymmetric parameter spaces could be 
used to yield optimal quadratic minimax estimators in a way that is analo- 
gous to a similar theory for minimax estimation of linear functionals given 
in [12]. See [10] for further discussion of this connection and the connection to 
estimating the whole signal 9 in the minimax estimation setting. The adap- 
tation theory for estimating the quadratic functional Q{0) developed in the 
present paper is also similar to that for estimating linear functionals. For 
linear functionals Lepski [26] was the first to show that logarithmic penalties 
must often be paid when adapting over collections of parameter spaces, as 
is the case in the present paper. Further refinements and generalizations for 
adaptive estimation of linear functionals can be found in [9, 24, 27]. 

4. Proofs. The main results are proved in the order of Proposition 1, 
Theorem 2 and then Theorem 1. Detailed proofs are only given for Lp balls 
since the proofs for Besov balls are entirely analogous. In this section C 
denotes a positive constant not depending on n that may vary from place to 
place, 0(2;) and ^{z) denote the density and cumulative distribution function 
of a standard normal random variable and ^{z) = 1 — ^{z). 

4.1. Preparatory results. The following lemma helps in the analysis of 
term-by-term thresholding estimators and is important to the proof of Propo- 
sition 1 and Theorem 2. 

Lemma 1. Let X ~ N{e,\) and r > 1. Set ho{t) = Eo{{X^ - ^)+} 
where the expectation is taken under 6 = 0. Let = {X'^ — — ij,q[t). Then 



Table 1 

Comparison of the performance of the estimators Qlm and Q 





0<p< 1 




1<P<I 








— 2p 


2i<«<l 




R{Qlm) 
R{Q) 


-1 

n 


^ log n ^4Q/(l + 2a) 
^_(2-p/(l+2p3)) 

x(logn)2f''/(i+2f'') 


/ log n \Aa / {l + 2a) 
^ n ' 


-1 

n 


|<p<2 




a < ^ - 1 
— p 


P — 2p 


1,^1 1 

2p — p 4 


p 4 


R{Q) 


^ logn ^4a/(l + 2a) 
^-(2-p/(l+2ps)) 

x(logn)2p''/(i+2p-) 


^ logn ^4s/(l+4s) 
„-(24>/(l+2ps)) 

x(logn)2P''/(i+2P'') 


/ logn -,4s/(l + 4s) 
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(35) \Ee^-e'^\<mm(—,e'^ 



n 



and the variance of E, satisfies 

n n e ' 

In addition, if Z N{0, 1) and V{t) = Var[(Z2 - r)+] then 
(37) V{t) < (16r-i/2 _ 9^-3/2 ^ 9^-5/2)^(^i/2)_ 

Proof. Equations (35) and (36) are from [10]. For (37), it follows from 
the standard alternate series tail bound ^{z) < {z~^ — z~'^ + 3z~^)(j){z) for 
z > that 

POO 

V(t) < 2 / (z"^ -Tf(j)(z)dz 

= (6rV2 _ 2^3/2)^(^1/2) + (6 _ 4r + 2t')^t'/') 

< (16T-1/2 _ 9r-3/2 ^ 9^-5/2)^(^1/2)^ □ 

Lemmas 2 and 3 given below provide useful properties of term-by-term 
thresholding estimators and are central to the proof of Theorem 2. 

Lemma 2. Let Xi = 9i + Zi where Zi iV(0,cj2) for i = 1,2, . . . ,m. 
Let C = Elti Of. Let X>0. Then for any x 



(38) 



< P\ [(Zi + e'^f - A]+ + ^(Zf - A)h. > 



i=2 ; 



That is, for a given value of £^=Y^^^6f , the random variables YllLii^., 
A)+ are stochastically maximized when 9i = ^^^"^ and Oi = for all i 



Proof. Intuitively the result of this lemma seems clear since given the 
sum J2iLiXi the value of J2iLi{Xi — ^)+ is a decreasing function of the 
number of nonzero terms in this sum. A formal proof can be given as follows. 
Begin with the case when m = 2. Let x > and note that 

P{{Xl-\)+ + {Xl-\)+>x} 

= E{P{{Xl - A)+ + {Xl - A)+ > x\Xl + Xl}). 
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It thus suffices to show that the conditional probabihty 

g{x- 61,62, p) = P{{Xl - A)+ + (X| - A)+ > x\Xl + X| = p"] 

is maximized when 61 = ^^/^ and ^2 = since the distribution of + X| 
depends on 61 and 62 only through 6*^ + 0| = ^. 

Note that if < A + x, then g{x;6i,62, p) = = g{x;S,^/'^ ,0, p). On the 
other hand, if p^ > 2X + x, then g{x; 61,62, p) = 1 = g{x; 0, p). Now con- 
sider the main case A + x < < 2A + x. In this case 

g{x; 61,62, p)= P{Xl >\ + x\Xl + X| = p"} 

+P{Xl > A + x\Xl + X| = p2}. 

It is more convenient to use polar coordinates by setting Xi = pcos((/>), 
X2 = psin((^). The conditional distribution of {Xi,X2) given Xl + Xl is a 
von Mises distribution. See, for example, [7, 28, 30]. 

Since the distribution of Xf depends only on 6*?, i = 1,2, without loss 
of generality we assume 6i > 0. Let 13 be the angle between the direction 
of {61,62) and the horizontal axis. More precisely, cos(/3) = —j===. Then 

< /? < ^. The conditional distribution of (j) given p is thus given by q^{4>) = 
gg(icos(0-/3) -^j^gj-g g and d are some positive constants. See [30]. Let 

^^/3 ((A) =9/3(0) + 9/3(^0+ 1^ + 9/3(0 + 7r) +9/3 (^</'+ 

— ggrfcos((^-/3) _|_ gg-dcos((/)-/3) _j_ ggdsin((/)-/3) _|_ ^^~dsm{if>-(3) ^ 

Then g{x;6i,62, p) = I - J^l^uisicp) d<j), where 0o = cos-^^^^. Note that 
O<0o<f. 

It is easy to check that up{(j)) has the following properties: 

• It is periodic with period |, up^cp) = up{(j)+ ^). 

• up{(j)) attains its maximum when <j) = (5. 

• up{l3 + x) = up{l3-x). 

• Up is decreasing on the interval [(3,13 + ^). 

Noting the properties of u^, it now follows from the rearrangement result 
in [20], page 278, that the above integral is maximized when /3 = 0, which 
corresponds to 61 = ^^/^ and 62 = 0. Hence g{x;6i,62, p) < g{x;S}^'^ p). 
This completes the proof for m = 2. The general case now follows by first 
conditioning on X^ + Xl , X^, . . . , X^ and then by induction. □ 

Lemma 3. Let Zi A^(0, 1), i = 1, . . . ,mn, with rUn > n. Let 7 > 
be fixed. Let Tn,i > 0, pn,i = E[{Zi - Tn,i)+\, cr^^i = yar[{Zf - Tn,i)+] and 
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Vn = J2i=i'^ni- Then there exists some absolute constant > such that 
for all sufficiently large n 

\ 2 



E< 



(39) 



< {2-fVn\ogn + c,V^/^){^\ogny 



1/4^-7/4 



Proof. Set A, = £;{(E2ri[(^'-Tn,i)+ -/in,^] - (7Klogn)i/2)_^}2. The 
Cauchy-Schwarz inequality then yields 

An<{E[ Y.[{Z^ - Tn,i)+ - /in,i] " {-iVnlognfl^ 



, i=l 



P[ Y,[{Z^ - Tn,i)+ - llnA > (7Klogn)^/2 



1/2 



.1=1 



It is easy to verify by direct calculations that 

(40) E\{Zl-Tn,i)+-lln,-^^ 



sup 

Tn,i>0 



< oo 



and that the characteristic functions of {Zf — Tn^i)+ are analytic. It then 
follows from [17], page 553, and the standard normal tail bound ^{z) < 
z~^(j){z) for z > that there exists some >0 such that for all n>n^, 

(m„ \ 
J2[{Z! - rn,0+ - l^n,] > (7Klogn)i/2j < (7logn)-i/2n-^/2. 

Set Bn = ^(ES[(^' - ^n,i)+ - f^n,i] " (7K log n)i/2)4. jt then follows from 
Rosenthal's inequality ([19], page 23) that for some absolute constant ci > 

/ m„ \ 4 



+ 4(7Klogn)^ 



<cAj2 E[{Z! - Tn,i)+ - fJin,if + + 4(7K logn)2. 



It is also easy to verify by direct calculations that 



(42) 



sup 

Tn.i>0 



Em-rn,i) 



< 00 



and hence i?„ < (47^ log^ n + ci)V^ + C2Vn for some absolute constant C2 > 0. 

1/2 

Therefore for some absolute constant > 0, An < (27!^ log n + c-tVn ) 
X (7log?i)^^/'^n^'^/^ for all sufficiently large n. □ 
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4.2. Proof of Proposition 1. The proof of Proposition 1 relies heavily 
on Lemma 1. Denote by B{6) and V{0) the bias and variance of Qk,, re- 
spectively. Set O = Lp{a,M). Let Cmid and ^taii be given as in (15). Set 

Cmidl = J2i=mo+l ^i^'?mid2 = J2iJmk^+l ^j^'^midl = {J2i=mo+l ~ ^^'^ 
Cmid2 = J2i=m^:^+l{{Yi^ ^)+ ~ fJ-k,,i}- 

We shall consider the bias and variance separately. First consider the 
variance. Note that mo = n^^vi and ^ = J^i^i ^1. ^ A{Q). Hence 



(log n)^ 

(43) Var(lo) = E Var(y.^) = 1^ + ^ < + ,(i)). 



1=1 



Note that for any random variable X, Var((X)+) < Var(X). See, for exam- 
pie, [10]. Hence Var(Udi) < E^^t.+i Var(y,2) = _^ 2(m,^^-m,) _ ^^^^^ 

fic ir^^'^ +18 

1 yields that \&i{Uid2) < + ETJm^^+i ^2pk„0/^ consequently 

Var(lmid) = Var(lmidi) +Var(|mid2) 

<6Ud , 2m^, ^ <1 + 18 

Note that for m^^^j^i + 1 < i < rrik^j^j and j > 1, r^.^ j = 2j. Hence 

E ^^3^=1: i: (4(2i)"= + 18).-%- 

(44) = E (4(2j)i/2 + i8)e-J2J-imfc.n-2 

for some constant C > 0, since Ejti(4(2jy/2 -h 18)e--''2-'-i < cx). Therefore 

(45) Var(^mid) < 6^mid?^"^ + Cmfc^n"^ 

For the tail component, note that Lemma 1 once again yields 

V ,^66ai- ^ 47^^/^logn)V2 + 18 
Var Ctaii) < + 2^ 5—7^ • 

J=mj+1 

TT • ■ -1 1 ■ ■ / A A\ 1 ^00 47^''^(logn)i/2+18 ^ 

Usmg smiiiar derivation as m (44j, we have l^i=mj+i — ' — ^2+7^/2 ^ 

Cn~'^ X (logn)^/2 for ?i > 3 and some constant C > 0. Hence 

(46) Var(^;aii) = o(n-^). 
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We now turn to the bias. Note that 

{Eg^raid - Cmid)^ < 2(£'6l^midl - '^midl)^ + '^{Ee£.mid2 - ^mid2)^ 

and 

{EgCmidl — Cmidl)^ < -^'^(^niidl " Cmidl)^ 

= Eel( J2 Y^->^k)j -Udi] 

\ i=m.o+l 

, \ 2 

/ ruk, - mo \ / rrik, - mo 



1 + '^midl - Afc, 

\ n / \ n 

/ "ife* \ / X 2 



\i=mo+l / 

4Cmidi 5mfc, log mfc. 



On the other hand, Lemma 1 shows that 

(-Ee^mid2 - ^mid2)^ < i E min(— ^ 



and consequently the squared bias of the middle component satisfies 
/i^ P . x2^8^midi , Wrukjogmk, 

l-C/eSmid — ?midj S 1 n 

(47) 



mm 



n 



Now consider the tail component. In this case Lemma 1 shows that the 
absolute bias of the tail component satisfies 



EeL^-^U< E min(^2:ii^,ef)+ £ 



. V / . (27r7i)i/2(logn)i/2j^i+7./2- 

i=mj+l ^ ' i=mj+l ^ "■' \ 1^ J 

Note that 7^ = 2[j + 2) for mj^j + I <i < mj+j+i and j > 0. Hence 

4 

j=mj+l 



^ (27r7i)i/2(logn)V2ni+7./2 
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4 



^ (47r(j + 2))i/2(logn)V2n3+i 

^ • 4 
= V 2^ - - 

(logn)2 (47r(j + 2))V2(logn)V2,i3+j 

<Cn~^(logn)"^/2 

since I]j^o Q_|_2)i/2riJ — ^i=o (j+2^)i/^3J ^ whenever n > 3. Hence the squared 
bias of the tail component satisfies 

(48) {EeU-Ct.nf<\ f min(^:^,02V ^^-i(i^g„)-5/2f 

lj=mj+l ^ ^ J 

We shall consider four separate cases. 

Case 1. p>2 and a> ^. In this case A;* = 1 . Note that 

Var(gfcJ = Var(lo) + Var(|mid) + Var(|tail)- 
Note also that there exists a constant C > such that for any m > 1 

oo 

(49) supY, ef <Cm~^''. 

It then follows from (45) that sup^/gQ Var(^mid) = o(n~^). This, together with 
(43) and (46), shows that sup^gQ ^('9) = 4^(0)^"^(1 + o(l))- 

Now consider the bias. Note that is an unbiased estimator of ^o- Hence 

B'^{6) < 2{EQ^y^iA - irnidf + '^{Eeitaii - Ctail)^- 

Note that in this case " \-i < nT-k^ = 2mo < n ^" V2 and a > 4. Then (49) 
together with (47) and (48) yield that 



p t \2 ^ /8^midi , lOmfc, logrrifc, / ^ 2 

SUp{Eo£,raid - 6nid) < SUp<^ \ g ^ ^ > 9^ 

= o(n^i) 

and 

^ nn ^ 2 



2> 



sup(Se|taii - Ctail)^ < sup<^ ^ + Cn ^(log 



. i=mj+l 



and consequently supgg@ B'^{9) = o{n ^). 

Case 2. p>2 and a < In this case m^^ satisfies 

ln2/(i+4")(logn)-i/(i+4") <7n,,. <n2/{i+4")(logn)-^/(i+^"). 
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It then follows from (45) and (49) that 

supVar(Lid) < Cn-«"/(i+^")(logn)-i/(^+4"). 



This together with (43) and (46) yield that sup^gQ V{e) < Cn-s°/(^+^°) 
X (logn)-V{i+4a) for a<i and supf)^QV{9) = 4A{e)n~^{l + o{l)) for a = 

1 

4- 

For the bias it follows from (47), (48) and (49) that 

fTP £ t ^2/ JSCmidi , lOrrife, logTTifc, / ^ 
sup(£'e4nid - Cmid) <sup<^ \ 2 ^2 \ 



<ci 



logny°/(i+4") 



sup(£;e|taii-etaii)'<sup<' ^ 0^ ^ Cn^^ (log n)-^/^ =o(n-S"/(i+^")), 



See 6»ee 



k j=mj+l 



and hence supege^^(^) < C(i^)4°/(i+4'^) . 

Case 3. p < 2 and ct > This case is similar to Case 1. Note that in this 
case fc* = 1. Note also that for p <2 the Lp ball constraint (2) yields that 
for any m > 1 



(50) E^B^ 



It then follows from (45) that sup^g© Var(^inid) = o{n ^) and thus 
supy(e) = 4A(G)n^i(l + o(l)). 

We now turn to the bias. Note that it is straightforward to verify that for 
all 6 G Lp{a, M) and all j > 1 

(51) E \ei\P < MP2P'2-^P'ml^\ 

Note also that for mfc^+j_i + 1 < i < mk,+j-> '^kt,i = ^i- Hence 

«=mfe*+l J=l «=mfe,+j-i+l 
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J-k, ^. rn^.*+j / f „ ^ p/2 



where the last step follows from the facts min(l, Of • ^) < 1 and f — 1- Hence, 



for some constant C > 0, since X^j^i ^^'^2 < oo. Similarly, 
(53) £ min('^^:il^,02\ <^^-ps^-{i-p/2)(i^g„)i-p/2_ 

Note that m/c^ = 2mo > n(logn)~^ and mj > \v?i\ogn)~'^ . Note also that 



in this case ap> ^. Hence m^J^^n ^'/^^ = o(n -"^Z^) and m.j^'^n ^/^^ x 
(logn)^-P/2 = o(n~i/2). Bounds in (52) and (53) together with (47) and (48) 
yield 

sup(-EeCmid-^mid)^ = o(n~^) and sup(-Ee|taii - Ctaii)^ = o(n^^) 
eee 6»ee 

and consequently supgg@ i?^(6') = o(n~^). 

Case 4. p <2 and a < Note that in this case 

l^p/(l+2ps)(l^g,^)-l/(l+2p.) ^ < ^p/(l+2ps)(i^g^)-l/(l+2ps)^ 

It follows from (45) and (49) that sup^ge Var(Cmid) < Cn-(2-p/(i+2ps)) ^ 
(logn)-^/(^+2p^). This together with (43) and (46) yield that supege^(^) < 
(-.„-(2-p/(i+2ps))(iQg„)-i/(i+2p.) for a < ^ and supg^QV{e) = AA{Q)n-^{l + 

o(l)) for a=^. 

On the other hand, (52) and (53) yield 

(54) Yl min('^,02\ <^^-i/2(2-p/(i+2p.))(i^g^)p./(i+2p.)^ 
g niin ("^^^i^ , < C7n-P(°+^) (log n)P("+^) 

i=mj+l 

(55) 

= o(n-^/2(^-P/(^+2P''))). 
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It now follows from (47) and (48) that 

supiEeU^-Udf < Cn-(2-p/(i+2p.)(i„g^)2p./(i+2ps)^ 
eee 

supiEeU - Ctaii)' = o(n-(2"P/(i+2p.)))^ 

6»G0 

and hence supege^^(^) < Cri"(2-p/(i+2p.))(iQg^)2p./{i+2p.)_ 

Remark. An inspection of the proof of Proposition 1 yields the follow- 
ing maximum mean squared error results for B = Lp{a, M) and G = Bp q{M) 
which are useful for the proof of Theorem 2: 

(56) snpEeiio - Co)' = 4A(e)n^i(l + o(l)), 
See 



(57) sup Eg (ik, - ^midf 

eee 



o{n ^), if ap* > ^, 

Cn~(2-P./{l+2p,s,))(log„)2p,s./(l+2p.s.)^ 

if ap^ < ^, 
o{n~^), if ap* > ^, 



(58) SUpi^,(6an - etail) - I ^(^_(2_p./(l+2p.s,))), if < 1 . 

It should be stressed that in (57) is the estimator defined by (24) which 
corresponds to a fixed Besov or Lp ball. 

4.3. Proof of Theorem 2. Let the estimator Q be given as in (31) and 
set © = Lp{a,M). Note that Q{9) = Co + Cmid + '^taii- Note also that is 
an unbiased estimate of and is independent of ^mid and ^taii- Let the 
estimator Q be written as in (33). Then 

Ee{Q - QiP))^ = -Eedo - '^O + imid - imid + '^tail - '^tail)^ 

= Eo{E,0 - + -E'e(Cmid - Cmid)^ + -E'e(Ctail - Ctail)^ 

(59) 

+ 2£'5)(Cinid - Cmid)-E^0('^tail " '^tail) 
< E0{^Q - + 2Eg{^^id - Cmid)^ + 2£'e(6ail - '^tail)^- 

The difficulty lies in the analysis of -E'6i(Cmid — Cmid)^ where ^mid is given in 
(32), since the other terms i?6»(Co — Co)^ and -E6»('?taii — Ctaii)^ satisfy (56) and 
(58). Set uJk = ^"^"n^"^ "' ^ simple but important observation is that 

(Cmid - 'Cmid)^ = ( ™ax {Cfc -^k} - Cmid 

J 

- l<fc</*-^'' - '^fc - ?mid)^} + -UJk- ^mid) + ?- 

~ ~ k=l 
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Hence 

-E'edmid - Cmidf < ^min^{Eg{^k - Wfc - Cmid)^} 

— ?mid)+]^ 

Cmid)^ 
J 

+ E^e[(efc-^fc-emid)+]^ 

k=l 

where k^: is defined as in (23). The major difficulty in the analysis which 
follows is to show that the second term on the right-hand side of (60) is 
always negligible compared to the minimax risk. The first term is the dom- 
inant term and its analysis is made straightforward by the bounds given in 
(57). 

For analysis of the second term on the right-hand side of (60), first 

consider the term Eg[{ik - Wfc - ^mid)+]^- Let Ck,i = YT=mo+i^h ^k,2 = 

ETJm.+i Of, 4,1 = (ETJ^^+i - A,)+ and 4,2 = j:T=L,+i[iY^' - ^)+ - 
fik,i]- Note that it follows from the elementary inequality {x + y)+ < (x)+ -|- 
(y)+ for x,y gM that 



(60) 



k=l 



< Ee i^kt - ^k. 



(61) 



+ 2Ee[iik,2-u;k-Ck,2)+?- 



For the analysis of the first of these terms note that 



[(^fe,i - Ck,l)+Y 



E y^->^k-^k,i 

,i='mo+l / 



rrife 



- E zf-Xk + 2n~^/'^ E ^i^^ 

j=mo+l i=mo+l / 

It then follows from the inequality 

(62) [(x + y)+]2<2[(x)+]2 + 2y2 

for any real numbers x and y that 



1 2 



Ee[{ik,i-Ck,i)+?<2E 



1 



1 2 



i=mo+l 



+ 8n~'E[ E 



, i=mo+l 



E zf - nXk 

,i=mo+l / 
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Set m = rrik — mo- It then follows from Theorem 2.1 of [21] that 



E 



vi=mo+l J 



(2\lm, log m + m \ ^ „ , , , 

V 2y mlogm + 2 / 



where is a central chi-square random variable with m degrees of freedom. 
It then follows from Lemma 2 of [8] on the tail probability bounds of the 
chi-square distribution and by noting log(l + x) < x — + |x'^ for all x > 
that 



P{X-m > m + 2\/mlogm) < - exp ^ 



m 



m 



m 



1 ( 4(logm)3^\ 3 

<-exp[-logm+^-^j<-, 



where the last inequality follows from the fact that ^^"^T/l ^ 
maximum at m = and hence ^ exp( ^^^°^™^^ ) < 3. Therefore 



attains its 



E 

and hence 
(63) 



zf - nXk 



^ '2-y/m logm + m \ ^ 3 ^ 24 



2^/mlogm + 2 / m logm 



48 



+ 



log(mfc — mo) n 



We now turn to Eg[{^k,2 — '^k — Ck,2)+]'^ and show that the follow bound 
holds. 



Lemma 4. For some constant C > and for all sufficiently large n 
(64) Ee [(4,2 -uJk- ik,2)+? < Cn"i (log n)-^/^ + A^k,2n~^ logs ' 



, n. 



Proof. Set Z?fc,2 = Ee[{ik,2 -oJk- 6,2)+]^- Note that 

mj 



Ck,2 - J2 

i=m,fc+l 



2 ''"fc,^ 



n 



IJ-k-i 



J-k rrij+k 

E E 

j=l r=mj+fc_i+l 



y2_^ 

-' i 

n 



Set rjj = X]i=?mj+fc 1+1 ^ ^ j ^ J — k- It follows from Lemma 2 that for 

a fixed value of r]j on a block nij^k-i + 1 < i < iTij^k, J2 



1/2 

is stochastically maximized when ^m^^^_i+i = f/j and the remaining 9i = 



i=mj-l-fe_i+lV-'i n ) 
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Hence 



r]j + 27] j' n 



J~k "ij+k 

+ E E 

j=l j=mj+fe_i+2 



n 



■ /^fc,mj+fc_i+l 



1 2 2j- 

-Zi fJ-k, 

n n 



■ — ik,2 



Noting J2j=i Vj = ?fc,2; it then follows from the fact that (x + y)+ < + 
(2/)+ and (62) that 



(65) 



Dk,2<E\^lq22rl/'n-y\z 

J-k "ij+fc 

+ E E 



1 2 2i 

-^i j - f^k,t 

n n ' 



( /j-k mj+fe \ ~| 2 

< 2n~^El E [(^^' - - '^Mfc,*] - 

j-J-fc ^2 

+ 2n"^i^|E27?j/^(z^^,^,_,+i)+| . 
It is easy to see that the second term 

(J-k ^2 

2n-^E\ Y 2??]/'(zrn,+,_,+i)+ < 2n-i J ^ 47?,i?{(z„^,^,_,+i)+}2 

(66) 

We now use Lemmas 1 and 3 to bound E{{J2jZiET=m,+,_,+i[izi - 2j)+ - 
iT-fJ-k,i\ — 'nuJk) + }'^- First note that equation (37) in Lemma 1 yields that 
Var((zf - 2i)+) < [16(2i)-i/2 - 9(2jr + 9(2j)"'/'] • (2^)^^/2g-^- 
and consequently 

^n=E E (Var(z2 - 2i)+) 

j=l i=mj+fc_i+l 
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where the last step follows from the fact that 

which can be verified by direct calculations. It then follows from Lemma 3 
with 7 = 4 that for all sufficiently large re 

H J2 ~ - ^Mfc.i] - nWfc J I 

L \j=l i=mj+k_i+l J +) 

(67) 

< Cmfc(logre)^/'^re \ 

2 

where C > is a constant. Noting < reij < ^i^^yi and J < log2re, (65), 
(66) and (67) together yield that 

Ee[iL,2 -oJk- ik,2)+? < Creifc(logre)3/4n-3 + Uik,2n-^ 

< Cre"^(logre)"^/^ + 4^fc,2?^""^log2n 
and Lemma 4 is thus proved. □ 

We now return to the proof of Theorem 2. Lemma 4 together with (61) 
and (63) yield that for all sufficiently large n 

J 

Eeliik - - emid)+]' < CJn"2(logre)-i + CJre"! (log re) "^/^ 

fe=i 

+ 4JCmid'^~"^log2?^ 

<C{re-2 + re-i(logre)-i/4 + ^^.j„-i(logn)2}. 

It then follows from (49) for p > 2 and (50) for p < 2 that 

J 

(68) sup V£^e[(4 -Wfc -Cmid)+]^ =o(re"^) 

and is thus negligible relative to the minimax risk. 

The rest of the proof is now straightforward. It follows from (59) and (60) 
that 

E6{Q - Q{d)f < Eei^o - ^of + 2Ee{L, - ^k, - Ud? 

J 

+ 2 ^ EoK^k — Wfc - Cmid) + ]^ + 2£'e(|tail — 6ail)^ 
, , k=l 

(69) 
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J 

k=l 

The remainder of the proof can be separated into two cases, ap^ < ^ and 
ap^, > ^. First consider the case when ap^, < |. In this case it follows from 
the definition of m^^ given in (23) that 

(70) ujI = 36"^fe>g^ < 72n-(^-P*/(^+^P*^*))(logn)^P*^*/(^+^P*^*). 

For this case the theorem now immediately follows from (69), (68), (70) and 
(56)-(58). 

For the case ap^, > ^ first note that A;,,, = 1 and oJk, = 2mo = o{n ^). The 
theorem then immediately follows in this case from this observation, (69), 
(68) and (56)-(58). 

4.4. Proof of Theorem 1. We divide the proof into two cases, p>2 and 
< p < 2, which correspond to the cases = 2 and p^, < 2, respectively. 
The case where p > 2 is standard but we include a brief outline for the sake 
of completeness. In this case we apply Theorem 2.1 of [14] combined with 
Theorem 4 of [11]. Let 

Lo{S) = sup|q(0) : £ < ^^ ^ e Lp{a, M)| 

be the modulus of continuity introduced in [13] . For small 5 let ~ (4«+i) _ 
Let 9 = {01,92, . . .), where 9i = c5(2°+i)/(4«+i) for i = 1, . . . , n and otherwise 
9i = 0. It is easy to check that 9 £ Lp{a,M) for sufficiently small c > 0. 
Simple calculations then show that uj{6) > for some D > 0. 

It then follows from Theorem 2.1 of [14] and Theorem 4 of [11] that if Q 
satisfies (10) then 

(71) snpiEeQ-Q{9)f>uid^)-2VCr^-'r/Ud^)r^'^\ 
060 \ n J \ n J 

Equation (11) follows by taking a sufficiently small d. 

We now turn to the case where p <2 and « < ^ > in which case p^, = p 
and = s. The proof follows a similar argument to one given in [10] for 
minimax lower bounds. The main idea is to place a prior on the union of 
the zero vector and the vertices of a suitable collection of hypercubes. The 
constrained risk inequality given in Theorem 4 of [11] can then be used to 
yield a lower bound for the maximum mean squared error over the vertices 
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of the hypercubes given an upper bound on the mean squared error at the 
origin. 

More precisely, let Qk,m be the union of the zero vector 6*0 = (0, 0, . . .) and 
the collection of vectors which have exactly k nonzero coordinates equal to 
in the first m coordinates and are otherwise equal to zero. It is straight- 
forward to check that Q(k,m) C Lp{a,M) when m = nP/(^+^P'')(logn)~^/(^+^P'') 
and k = y/bm logm for sufficiently small constant 6 > 0. 

As in [10] let 2{k, m) be the class of all subsets of {1, ... , m} of k elements 
and for / G I{k,m) let 9j G Qk,m be the vector where the jth coordinate is 
zero if j ^ / and is equal to for j G /. 

Let ijjfj, be the density of a normal distribution with mean /i and vari- 
ance i. And for I £l{k,m) let gi{yi, . . . ,ym) = IlT=itpfijiyj), where fij = 
G I). Finally let g = E/GX(fc,m) 5/ and / be the density of m in- 
dependent normal random variables each with mean and variance ^. Note 
that a similar mixture prior was used in [2] to give lower bounds in a non- 
parametric testing problem. 

The application of the constrained risk inequality of [11] requires an upper 
bound on the chi-squared distance between / and g. Cai and Low [10] shows 

that J -j^ = Ee^ where J has the hypergeometric distribution P{J = j) = 
\j)\k-jl_ j^Q^g ^j^gj^ pg^gg ^^Yiat 



(■;) 



For k = \/bm logm, (1 — ^)~^' < e^^/™ < and it follows that 

,2 / i.\ k 



(72) j9_<m{l + {e-l)^ < 



^g(e-l)fc /m < ^. 



The constrained risk inequality in Theorem 4 of [11] then yields that if for 
any e > 0, Ef{Q - Q{9o)f < then for 6 < f 



(73) 



E,Q-kn >^-2mW2^!!^ = ^(i + o(i)) 



bmlogm 
2 + 



n 

Hence for some constant Ci > 

inf sup {EeQ-Q{e)f 

Q e&Lp{a,M) 



>mf sup {EeQ-Q{e)f>^-^^ 



Q 9ee 



k.m 
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